Template Sampling for Leveraging Domain Knowledge in Information Extraction

نویسندگان

  • Christopher Cox
  • Jamie Nicolson
  • Jenny Rose Finkel
  • Christopher Manning
چکیده

We initially describe a feature-rich discriminative Conditional Random Field (CRF) model for Information Extraction in the workshop announcements domain, which offers good baseline performance in the PASCAL shared task. We then propose a method for leveraging domain knowledge in Information Extraction tasks, scoring candidate document labellings as one-value-per-field templates according to domain feasibility after generating sample labellings from a trained sequence classifier. Our relational models evaluate these templates according to our intuitions about agreement in the domain: workshop acronyms should resemble their names, workshop dates occur after paper submission dates. These methods see a 5% f-score improvement in fields retrieved when sampling labellings from a Maximum-Entropy Markov Model, however we do not observe improvement over a CRF model. We discuss reasons for this, including the problem of recovering all field instances from a best template, and propose future work in adapting such a model to the CRF, a better standalone system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Extracting Template for Knowledge-based Question- Answering Using Conditional Random Fields

In this paper, we present an information extraction system that extracts template elements for a question-answering (QA) system in the domain of encyclopedia. We use Conditional Random Fields to extract templates from the texts of an encyclopedia. Using the proposed approach, we could achieve a 74.89% precision and a 55.77% F1 in the template extraction. In the question classification, we could...

متن کامل

Attribute Relation Extraction from Template-inconsistent Semi-structured Text by Leveraging Site-level Knowledge

A variety of methods have been proposed for attribute-value extraction from semistructured text with consistent templates (strict semi-text). However, when the templates in semi-structured text are inconsistent (weak semi-text), these methods will work poorly. To overcome the templateinconsistent problem, in this paper, we proposed a novel method to leverage sitelevel knowledge for attribute-va...

متن کامل

Focusing on Scenario Recognition in Information Extraction

This paper reports a research effort in Information Extraction, especially in template pattern matching. Our approach uses reach domain knowledge in the football (soccer) area and logical form representation for necessary inferences of facts and templates filling. Our system FRET (Football Reports Extraction Templates) is compatible to the language-engineering environment GATE and handles its i...

متن کامل

Focusing on Scenario Recognition in Infomation Extraction

This paper reports a research effort in Information Extraction, especially in template pattern matching. Our approach uses reach domain knowledge in the football (soccer) area and logical form representation for necessary inferences of facts and templates filling. Our system FRET' (Football Reports Extraction Templates) is compatible to the language-engineering environment GATE and handles its ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005